Add fix_task to verifier.json: mechanical-vs-authority repair routing by pengfei-threemoonslab · Pull Request #146 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-05-30T05:58:27Z

What

Adds fix_task to verifier.json — the single, deterministic repair instruction a verify run hands to whoever must act next. Stacked on #145 (the verdict-contract lock); review that first.

Why

A merge verdict of blocked / human_review_required is just friction unless it also says what to do next and who can safely do it. fix_task is that contract, and its routing encodes the product's core safety boundary: a coding agent may fix mechanical gaps, but an authority gap — approval/idempotency evidence it cannot prove, a weakened policy, a touched trust root — must route to a human so the agent cannot invent its way to green (reward hacking).

Routing (a pure projection of the head scan, never a model judgment)

Condition	actor	safe_to_attempt
`mergeable`	— (no fix_task)	—
every gating finding `autofix_safe`	`coding_agent`	`true`
any gating finding `requires_human_review`	`human`	`false`
`policy_weakened` / `trust_root_touched`	`human`	`false`
`insufficient_evidence` / `unknown`	`human`	`false`

Routing is by the per-finding autofix_safe signal, not the verdict label — a blocked-but-mechanical PR can still route to the agent, and a review-required PR with an authority gap routes to a human.

Changes

VerifierFixTask schema (actor, safe_to_attempt, instructions[], forbidden_shortcuts[], verification_command) with a validator enforcing actor="human" ⇒ safe_to_attempt=False (a human-authority task can never be marked agent-safe).
cli/verify/fix_task.py build_fix_task() — deterministic projection; forbidden_shortcuts are the anti-reward-hacking guardrails (no suppression, no severity-lowering, no inventing evidence, no weakening the policy that evaluates the change).
Orchestrator wiring; first_next_action.actor now routes through the same fix_task so the two agent-facing signals can't disagree.
PR comment renders fix_task as the authoritative "Required before merge" block (falls back to the prior path when absent).
Regenerated docs/verifier-schema.v0.1.json (additive optional field).
tests/test_fix_task_contract.py (13 tests).

Verification

Full suite: 2316 passed, 4 skipped, 0 failed
python scripts/generate_schemas.py --check: clean
ruff check: clean

🤖 Generated with Claude Code

…n guards The verify cycle (M1-M3) already computes one release verdict in build_release_decision() and projects it onto the report summaries and the agent-facing merge_verdict, but the discipline was enforced only by convention and docstrings. This makes it structural. - Define ReleaseDecisionStatus once in schemas/common.py and reuse it for AgentSummary.verdict, ReviewerSummary.verdict, VerifierVerdict, and ReleaseConsequence.decision (previously four hand-respelled Literals of the same vocabulary). Generated JSON schemas are byte-identical (generate_schemas.py --check clean) - no wire change, no schema bump. - Type _DECISION_TO_VERDICT as dict[ReleaseDecisionStatus, MergeVerdict] and add a totality test, so a new release status without a mapping fails CI instead of silently falling back to human_review_required. - Add a VerifierArtifact model_validator: when a head release_decision is present, merge_verdict and the decision copy MUST be exact projections of it - an inconsistent artifact is impossible to construct. - Centralize the no-decision verdict rule in merge_verdict_for(); delete the divergent inline _merge_verdict in the orchestrator (summaries defaulted to "passed", verify defaulted to "mergeable"/"unknown" - now one rule). - Add tests/test_verdict_contract.py pinning canonical-enum reuse across all verdict surfaces, projection totality + the exact table, the fail-safe (unknown status never auto-passes), and the validator. Full suite: 2302 passed, 4 skipped. Behavior unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…repair routing fix_task is the single repair instruction a verify run hands to whoever acts next. Routing is a pure projection of the head scan - never a model judgment: a coding agent may fix mechanical gaps (every gating finding is autofix_safe), but any authority gap (approval/idempotency evidence it cannot prove, a weakened policy, a touched trust root, or degraded evidence) routes to a human so the agent cannot invent its way to green. - Add VerifierFixTask {actor, safe_to_attempt, instructions, forbidden_shortcuts, verification_command} with a validator: an actor='human' task can never be marked safe_to_attempt (the anti-reward-hacking guarantee). - Add cli/verify/fix_task.py build_fix_task(): projects release_decision plus per-finding autofix_safe / requires_human_review into the task; policy_weakened, trust_root_touched, and insufficient_evidence force the human route. - Wire fix_task into the verifier artifact and route first_next_action.actor through the same fix_task so the two agent-facing signals never disagree. - Render fix_task as the authoritative "Required before merge" block in the PR comment (falls back to the prior agent-summary path when absent). - Regenerate docs/verifier-schema.v0.1.json (additive optional field). - Add tests/test_fix_task_contract.py. Full suite: 2316 passed, 4 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… single-source next action Addresses review feedback on #146: - Fail closed in the coding-agent route: require every gating finding to be explicitly mechanical (autofix_safe is True AND requires_human_review is False). A finding with None/False routing fields (stale/plugin/legacy) is now treated as an authority gap and routed to a human, never silently marked safe_to_attempt. - Shell-quote the refs in verification_command with shlex.quote (a valid git ref can contain ';', so the unquoted command was injectable); render it through the backtick-stripping _code helper so the PR comment stays Markdown-safe. - first_next_action borrows the agent-summary action only when its implied actor agrees with fix_task; otherwise actor, command, and why are all derived from fix_task so the two agent-facing signals cannot contradict. - Emit a human fix_task for a non-mergeable verdict with no head report (unknown), so the routing table holds and every non-mergeable verdict carries a fix_task. Full suite: 2323 passed, 4 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Surface the self-approval prohibition at the top of verifier.json When a PR weakens the release policy or touches a trust root, a coding agent must not silently self-approve a change to its own gate. That prohibition was only present inside a fix_task instruction (PR #146); promote it to the two fields an agent reads first. - Add _self_approval_note(): the explicit "a coding agent cannot self-approve that change - a human must review it" message for policy_weakened (taking precedence) and trust_root_touched. - verifier.json headline leads with the note when present. - human_review.why leads with the note, and a self-approval note forces human_review.required=True regardless of the verdict path. Full suite: 2346 passed, 4 skipped. No schema change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Self-approval: keep all top-level convenience fields consistent (review fix) Addresses review of #148: a self-approval note forced human_review.required=True, but can_merge_without_human and first_next_action still keyed only off merge_verdict, so the defensive (mergeable + note) path could emit "human review required" and "safe to merge" at once. - _can_merge_without_human returns False whenever a self-approval note exists. - _first_next_action routes to a human review (never the "safe to merge" action) when a self-approval note is present, including the fix_task-None defensive case. - Both thread capability_review from _build_verifier. Clean mergeable behavior (no note) is unchanged; covered by a regression test. Full suite: 2349 passed, 4 skipped. No schema change.

pengfei-threemoonslab and others added 3 commits May 29, 2026 22:41

Base automatically changed from feat/verdict-contract-lock to main May 30, 2026 06:32

pengfei-threemoonslab merged commit 6c2c416 into main May 30, 2026
1 check passed

pengfei-threemoonslab mentioned this pull request May 30, 2026

Surface the self-approval prohibition at the top of verifier.json #148

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fix_task to verifier.json: mechanical-vs-authority repair routing#146

Add fix_task to verifier.json: mechanical-vs-authority repair routing#146
pengfei-threemoonslab merged 3 commits into
mainfrom
feat/verifier-fix-task

pengfei-threemoonslab commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented May 30, 2026

What

Why

Routing (a pure projection of the head scan, never a model judgment)

Changes

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant